image source: google

Overview

The data Hotel Booking Demand Dataset were collected from two hotels (Resort hotel and City hotel) located in Portugal. It contains hotel booking information between the 1st of July of 2015 and the 31st of August 2017. For a speedy performance of our predictive model, we only select data from 2016 for training.

The dimension of the original published dataset is 119390 rows by 32 columns. After filtering instances from year 2016, the resulting dataset has 56707 rows.

We aim to explore the following questions using our exploratory data analysis:

  • Where do the guests come from?
  • How does price vary over the year?
  • What are the potential factors which could influence the cancellation of both hotels?

Exploratory Data Analysis

Home Country of Guests

Comment: As shown from the choropleth maps and the donut chart, most guests(42.8%) are Portuguese, with British and French guest come second and third.

Price Fluctuation

Comment We observed that the ADR is pretty stable across the years for city hotel, which is reasonable according to the booking demand pattern. For resort hotel, we see a sharp increase in ADR from May to November with the price level peaking during summer. This finding is also intuitive since we expect the demand for resort hotel increases in summer.

Seasonal Fluctuation

CommentCity hotel has the most booking in Spring (May-June) and Autumn (October); the number of bookings for the resort hotel has less fluctuation compared to that of the city hotel. The booking demand goes down slightly from June to September for the resort hotel.

Comment Unlike what we expect, there is no significant association between low rate and long lead time.
From the plot above, we can see that although the trend of mean adr is decreasing as the lead time increases, each data point varies a lot. Also, there is only a few instances in which the lead time is greater than one year, it may explains the few variation as the lead time increases.

Potential influence factors for cancellation

Assigned room type

Based on the proportion plot, we found that the cancellation proportion is highest among guests who were assigned room type P for both hotels. However, from the first distribution of assigned room type plot, we know that the number of assigned P type room is extremely low for both hotel, so we could ignore this room type at this point. Expect for room type P, we found that the cancellation proportion is distinctly high among guests who were assigned room type A, G, H with comparable sizes for Resort Hotel. As for City hotel, the cancellation proportion is found to be highest among guests who were assigned room type A with comparable size. Since this data set also contains another variable called “reserved room type” which is chosen by guests themselves. Therefore, we might infer that the room types mentioned before would not meet the expectations and needs of guests compared to their originally reserved room type so that make them cancel the order.

Distribution Channel

Based on the proportion plot, we found that the cancellation proportion is the highest among guest who booked the room through the TA/TO channel (“TA” means “Travel Agents” and “TO” means “Tour Operators”) for both hotels. Therefore, we could infer that this kind of channels may exists some problems such as information asymmetry with the hotel website, which could mislead guests when they make the decision. But the actual condition of the hotel might not meet the expectations or needs of guests so that they cancel the order.

The Market Segment Effect

From the distribution bar plot, we found that there are more customers reserve the room through online travel agents. From the proportion plot, we found that can proportion of cancellation among online TA and offline TA/TO of city hotel are similar. While customers who reserved through Offline TA/TO are much lower.

Meal type

Based on the plots above, for the distribution of the booked meal types, we found that the size of the booked FB meal for both hotels and SC meal for resort hotel were extremely small compared to that of the other booked meal types. Therefore, considering the result of these meal types might not be comparable to others, we just ignore the their effects on the cancellation proportion. Then, according to the proportion plot, we’ve found that the cancellation proportion of FB meal was indeed distinctly high for Resort hotel, which meet our previous consideration so that we ignored this effect. Otherwise, among all other booked meal type, for Resort Hotel, we found that the cancellation proportion was highest among guests who booked HB meal with comparable size; for City Hotel, the cancellation proportion is highest among guests who booked BB meal. Therefore, we could infer that these two meal types might not meet the expectations and needs of guests, which leads them to cancel the order.

The repeated guest effect

Based on the proportion plot, we found that for both hotels, the cancellation proportion is significantly higher among non-repeated guests, which meets the common expectation for the repeated guest effect.

The Lead Time effect

Based on the box plot, we found that, for both city and resort hotels, the medians of lead time among being canceled reservations are relatively higher than those of not being canceled.

The Deposit Type Effect

From the distribution bar plot, we found that majority of the deposit type for reservation is “no deposit” for both resort and city hotel. Based on the proportion plot, we found that for both resort and city hotels, reservations with non-refundable deposit have about 94% and 99% of cancellation respectively, which is significantly higher than other two types of deposits. For reservations with refundable deposit, we found that the cancellation of city hotel is relatively higher than that of resort hotel.

The Special Request Effect

From the distribution bar plot, we found that most of the customers do not have special requests, or have only one special requests. Based on the proportion plot, we found that, for city hotel, the cancellation proportion is the highest when the guests didn’t ask for any special request. Similarly, for resort hotel, the cancellation proportion is also higher when guests didn’t ask for any special request.

The Customer Type Effect

From the distribution bar plot, we can see that majority of the customers the two hotels received were transient customers, and some of transient party customers. There are only a few group or contract customers. Based on the proportion plot, we found that for city hotel, the cancellation proportion is significantly higher among contract and transient customers. While for resort hotel, the cancellation proportion is significantly higher among transient party and transient customers.